Down the rabbit hole: I just wanted to write a videogame

Posted by pulkomandy on Sat Jul 1 17:02:59 2023 • Comments (0) •

It is one of these weeks where nothing goes as planned.

Rabbit hole level 0: I wanted to write videogames

For some time now I've been part of a team trying to better document the VTech V.Smile console and make it easier to write games for it. They contacted me because I had some experience (and blog articles) about other VTech hardware.

The current efforts are documented in my VTech wiki. Most of the work was already done several years ago: datasheet and schematics were found, hardware was documented, games were dumped, emulators were written. But there was no documentation and no opensource tools to build new games, or at least, nothing quite production-ready. The only option would be the compiler suite provided by the CPU manufacturer (this is a custom CPU core, used in a few other game consoles).

So, after writing the documentation in the wiki, I started to experiment with writing an assembler and compiler. I initially started looking into vasm and vbcc, because my experience with these in the past had been rather good. The developers are helpful and the code is understandable by me and designed to make adding more CPU architectures easy.

Rabbit hole level 1: I need a C compiler

I quickly ran into problems with vasm, however. The CPU in the V.Smile is a purely 16-bit thing, which means it can't address individual bytes. While vasm has some support for this in the code, it was never used, and in fact, does not work. I discussed this with the vasm developers, and the solution they suggested is that all addresses in the assembler code should be prefixed with some special character, and the assembler frontend can multiply or divide them by two as needed.

I looked into that, but decided it would make writing assembler code more complicated and annoying than needed, as there is a risk of forgetting the marker and suddenly having your address all wrong.

On vbcc side, I did not have much problems, the porting guide is very complete and there is not too much work needed to get a basic version of the compiler running. But, without an assembler, it is not very useful. I did some experiments with Mikke Kohn's naken_asm, which has support for the UNSP CPU used in the V.Smile, but it is a simple assembler that can only directly generate a final binary. It has no support for temporary .o files and a linker. So in my tests I had to let the compiler generate assembler files and not assemble them, and also generate them in a way that they could be concatenated together at "link" stage before being assembled into a binary.

I got this to work for simple cases, but it is not great to work this way.

Rabbit hole level 2: I need an assembler and linker

I let the project sit for a while (I think it's been about a year?) hoping that someone else would do it(tm) or I would find a more suitable assembler somehow. I looked into ASXXXX, but this looks somewhat limited and not super easy to port.

So, eventually, I decided, if I'm going to port somethign not super easy, I may as well go for the Real Thing, and port GNU binutils. My research showed me that there is a porting guide, even if it's fairly short. And I think I have spent enough time doing low-level stuff (compilers, assemblers, wriing linker scripts, baremetal programming on AVR and ARM) that this should be within my reach. And so I cloned the git repository and started following the guide.

After just a few hours, I had something compiling and generating various executables: assembler, ar, objdump, etc for my architecture. I don't expect any of these to actually work, I started by just filling in empty functions and adjusting the buildsystem to get it to compile all things. The idea is then to run each of them, find what doesn't work, and add the missing functions as I go.

Binutils comes with a test suite, so I thought I would start by running that, look at all the failing tests, and fix them by adding bits of the code for my port, looking at how it's done for other CPUs.

Rabbit hole level 3: running the binutils test suite

This doesn't look too complicated: install the needed software, run "make check", investigate and fix bugs, and repeat.

So I went ahead and installed DejaGNU and expect which form the base of the testing framework. I then ran "make check" and… the testsuite immediately failed.

I had not heard of DejaGNU before, it seems to be a set of extensions to expect used to run tests on cross-development environments, typically, compile software on one computer, run it on another, and check that the results are as expected. I am not sure if anyone else uses it outside of binutils and gdb.

In any case, it is written in expect, which itself is written in TCL. And in the binutils case, it is also intertwined with the binutils build system which is written using autotools (and a specific version of it).

Rabbit hole level 4: learning to use expect

So my next step was trying to run a simple "expect" program. I quickly found that expect was completely broken, and it was a known problem with a bugreport opened at Haikuports since 2021. I have not mentionned that I am doing all this using the Haiku operating system, I would not run into these problems if I had chosen a more stable and finished operating system. But where would be the fun in that?

Anyway, so expect doesn't know how to open a PTY to communicate with another process (which is the main thing it is designed to do: spawn a process, read its output, match that with some regular expressions, and reply with some input according to a script).

A quick look at the code and buildsystem helped me find the problem: expect can handle many ways to open PTYs, and on Haiku, the preferred one was not picked because it requires linking an extra library that the expect configure script could not figure out. I quickly fixed that and… immediately hit another bug.

Rabbit hole level 5: coreutils

Now expect would correctly open a PTY, but it would fail to configure it. I once again dug into the sourcecode and found that it does this by running "stty sane" using the system() call. So I ran that same command in my shell, and indeed was greeted with the exact same error message.

Quick sidenote: I found the use of "stty sane" using strace and looking for calls to the exec system call. This almost didn't work: support for printing the command line of the executed command for exec in strace was added in Haiku by another developer just 3 weeks ago. So that's one rabbit hole jumped over, yay!

stty is a standard command provided by GNU coreutils (in Haiku at least, other operating systems may have their own version or one written by someone else under a different license or using a different programming language).

The expectation is that coreutils will detect and check a lot of things about the OS in their configure script while building, and compile the tools in a way that works for each system. But, they didn't handle the case where termios.h defined speed_t to be an unsigned char type. They are trying to set speed_t variables to -1 and later compare them equal to -1, and due to integer promotion rules in C, this is not the case. If someone is trying to tell you Javascript makes no sense, if you want them to go away, tell them about C integer promotion rules.

Anyway, I added the missing type cast, and stty started working. I thought I was finally ready to go one level up the rabbit hole towards the surface. I was wrong.

Rabbit hole level 4-and-a-half: expect again

I installed my newly built coreutils on my system, ran expect again, tried to run a child process, and this time, not only expect would start, but I managed to read the output from the launched program.

I then returned to the binutils test suite and ran 'make check' again. This time, it ran 2 tests, and the 3rd one made it stop waiting for something. I was a bit annoyed, not only because I had already fixed more bugs than I wanted to, but also because I was not too sure which part of the stack was wrong this time.

Eventually I found how to enable expect debug mode, and found which command it was running. I confirmed that the same command, ran standalone, returned immediately and with the correct results. So that wasn't a problem and I turned my attention to the test framework.

I studied the DejaGNU script for the failing test, and, while it took some time to peel all the layers, eventually I found that it was something quite simple: run 'ar' with some arguments, wait for the command to complete, and then check the output file. The failing part was 'wait for the command to complete'.

After some more experimentation with expect, I wrote a two line script that reproduces the issue. I ran it on Linux and confirmed that it has no problem there. Since that script is short, here is a copy of it:

spawn echo
expect eof

So basically, we start the 'echo' command and wait for it to terminate. And expect doesn't noticed that it terminates. 10 seconds later, there is a timeout (that doesn't happen in the coreutils tests because they set the timeout to 300 seconds instead of 10).

I turned to strace again, but I could not see a lot more. I also tried to follow the code in expect and in the tcl interpreter, but I quickly got lost. So I opened a support request on the expect bugtracker describing my problem, and went to sleep.

The next day, I had some answers from expect developers, mainly suggesting things that I had already tried but not included in my short ticket, so I shared the info (strace output) with them. And my fresher brain after a night of sleep also helped looking at things in more details. I know that expect uses a PTY to communicate with the spawned process, and so I decided to write a simple test program to do something similar with less "moving parts" involved: spawn a child process attached to a PTY, let it exit, and verify that the parent process waiting on the other side of the PTY is notified that the child is done.

Rabbit hole level 6: PTYs and poll

So I picked an example of PTY usage and started modifying it to my needs. And, I could easily reproduce the problem. Once again I made sure to run the program on Linux and Haiku to compare outputs. On Linux, when the child process exits, the PTY is closed and the poll in the parent process is notified. On Haiku, this does not seem to be the case, and so this program remains locked waiting forever. However, removing the poll call, a read call does not block, and properly returns an end of file. So it is just a problem of notifying the process waiting on poll that the file descriptor it is waiting on is now closed.

Now the next step is to fix that bug in Haiku. And, even if I do that, I don't know if it will also fix the problem in expect, as I was not able to find where in Tcl the waiting for file descriptors is handled.

So, as of now, I don't know if this rabbit hole has more rooms for me to explore, or if I will find my way up at least one level. Maybe I will lose interest in this and do other things for a few months before I get back to it. And probably I will uncover many more rabbit holes.

Conclusion

For people who think Haiku should not be in "beta" releases, I hope this helps you understand what we mean when we tell Haiku is not finished. It is not a safe ground to build any software on. Sure, a lot of commercial systems don't do any better, or didn't in the past, but still, the other options currently available aren't that bad nowadays. And not everyone is willing to get depp into these things like I do.

For people who wanted to use my C compiler to port games to the V.Smile: well, if you don't run Haiku, you can stay compfortably at level 1 or 2 of this rabbit hole and still be of help. If someone else was porting this assembler and compiler, I wouldn't need to run the binutils testsuite and all the deeper levels could be skipped. For now, at least.

For myself: sometimes it feels like I'm making no progress, but that's not true. It's just a lot of work in directions I didn't initially plan to go in. And such things are probably helpful for future projects as well. Also: I am surprised there were not more complaints about expect not working, and about PTYs being broken on Haiku. I thought these would be used a bit more often in typical UNIX toolchains?

PulkoMandy's homepage